-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable inference benchmark #5933
Conversation
benchmark/paddle/image/run_mkldnn.sh
Outdated
--use_gpu=False \ | ||
--trainer_count=$thread \ | ||
--log_period=10 \ | ||
--config_args="batch_size=${bs},layer_num=${layer_num},is_test=True" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_test=True,这个参数在三个网络中都没有。
预测的网络和训练的网络不同,请相应调整。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个变量主要是为了在reset 中batchnorm使用, inference 的时候use_global_stats
需要为true。
https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/paddle/image/resnet.py#L8-L9
https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/paddle/image/resnet.py#L47-L48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
预测的时候,是没有cost的,所以网络都要调整下。
benchmark/paddle/image/run_mkldnn.sh
Outdated
--save_dir="models/${topology}-${layer_num}" \ | ||
--config_args="batch_size=128,layer_num=${layer_num}" \ | ||
> /dev/null 2>&1 | ||
echo "Done" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 预测不要跟在训练的后面测,这样测预测性能的时候太慢了。
- 预测使用的网络不需要训练的非常好,因为只是测性能,拿任意一个batch保存的模型即可。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前是如果发现本地没有训练好的模型,才会去train下以此来生产一个模型做inference。
这个模型也只是训练一个num_pass,因为是dummy data只有1024张图片,训练也不会很耗时,也只会训练一次,后面相同网络的inference用的都是同样的模型,所以整体不会太影响的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inference能用另外一个脚本来写么?
benchmark/paddle/image/run_mkldnn.sh
Outdated
--use_gpu=False \ | ||
--trainer_count=$thread \ | ||
--log_period=10 \ | ||
--config_args="batch_size=${bs},layer_num=${layer_num},is_test=True" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
预测的时候,是没有cost的,所以网络都要调整下。
benchmark/paddle/image/run_mkldnn.sh
Outdated
@@ -30,13 +30,74 @@ function train() { | |||
2>&1 | tee ${log} | |||
} | |||
|
|||
if [ ! -d "train.list" ]; then | |||
function test() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test->inference
好的,没问题 |
Done |
我将脚本拉下来运行,打印的日志如下:
缺少统计时间的地方,之前训练的时候有
--log_period=32可以调大一点,可以选100。 |
是的,Training的时候有统计是因为code里面写死了。 这里如果需要我可以想办法在inference结束的时候计算下,不过需要去掉前几个算作burning time。 log的话可以调,不过需要根据batchsize的设置来了,我可以与上面的一起改了。 |
Done. 最后结果会像下面一样,每个case只输出10个log:
最后会出一个FPS的值。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look nice!
fix #5911